ESS330 Daily Assignment 22

Lecture 22: Introduction to Time Series Forecasting in ‘R’

Author

Neva Morgan

Published

April 23, 2025

Objective:

Use the Poudre River set to understand new methods for analyzing time series forecasting from the modeltime package.

OMG THE LIBRARIES ARE BUILDING!

library(modeltime)
Warning: package 'modeltime' was built under R version 4.4.3
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.3
Warning: package 'readr' was built under R version 4.4.3
Warning: package 'purrr' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.3
Warning: package 'lubridate' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
Warning: package 'tidymodels' was built under R version 4.4.3
── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom        1.0.8     ✔ rsample      1.3.0
✔ dials        1.4.0     ✔ tune         1.3.0
✔ infer        1.0.8     ✔ workflows    1.2.0
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.3.1     ✔ yardstick    1.3.2
✔ recipes      1.3.0     
Warning: package 'broom' was built under R version 4.4.3
Warning: package 'dials' was built under R version 4.4.3
Warning: package 'infer' was built under R version 4.4.3
Warning: package 'parsnip' was built under R version 4.4.3
Warning: package 'recipes' was built under R version 4.4.3
Warning: package 'rsample' was built under R version 4.4.3
Warning: package 'tune' was built under R version 4.4.3
Warning: package 'workflows' was built under R version 4.4.3
Warning: package 'yardstick' was built under R version 4.4.3
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
library(timeSeries)
Warning: package 'timeSeries' was built under R version 4.4.3
Loading required package: timeDate

Attaching package: 'timeSeries'

The following object is masked from 'package:dplyr':

    lag

The following objects are masked from 'package:graphics':

    lines, points
library(ggplot2)
library(tsibble)
Warning: package 'tsibble' was built under R version 4.4.3
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:lubridate':

    interval

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(lubridate)
library(forecast)
Warning: package 'forecast' was built under R version 4.4.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

Attaching package: 'forecast'

The following object is masked from 'package:yardstick':

    accuracy
library(feasts)
Warning: package 'feasts' was built under R version 4.4.3
Loading required package: fabletools
Warning: package 'fabletools' was built under R version 4.4.3

Attaching package: 'fabletools'

The following object is masked from 'package:yardstick':

    accuracy

The following object is masked from 'package:parsnip':

    null_model

The following objects are masked from 'package:infer':

    generate, hypothesize
library(zoo)
Warning: package 'zoo' was built under R version 4.4.3

Attaching package: 'zoo'

The following object is masked from 'package:tsibble':

    index

The following object is masked from 'package:timeSeries':

    time<-

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(xts)
Warning: package 'xts' was built under R version 4.4.3

######################### Warning from 'xts' package ##########################
#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#                                                                             #
###############################################################################

Attaching package: 'xts'

The following objects are masked from 'package:dplyr':

    first, last
library(fable)
Warning: package 'fable' was built under R version 4.4.3
library(prophet)
Warning: package 'prophet' was built under R version 4.4.3
Loading required package: Rcpp

Attaching package: 'Rcpp'

The following object is masked from 'package:rsample':

    populate

Loading required package: rlang
Warning: package 'rlang' was built under R version 4.4.3

Attaching package: 'rlang'

The following objects are masked from 'package:purrr':

    %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
    flatten_raw, invoke, splice
library(purrr)
library(timetk)
Warning: package 'timetk' was built under R version 4.4.3
library(caret)
Warning: package 'caret' was built under R version 4.4.3
Loading required package: lattice

Attaching package: 'caret'

The following objects are masked from 'package:fabletools':

    MAE, RMSE

The following objects are masked from 'package:yardstick':

    precision, recall, sensitivity, specificity

The following object is masked from 'package:purrr':

    lift
library(rsample)

Re-reading in data from Assignment 21:

library(dataRetrieval)
Warning: package 'dataRetrieval' was built under R version 4.4.3
# Example: Cache la Poudre River at Mouth (USGS site 06752260)
poudre_flow <- readNWISdv(siteNumber = "06752260",
                          parameterCd = "00060",
                          startDate = "2013-01-01",
                          endDate = "2023-12-31") |>
  renameNWISColumns() |>
  mutate(Date = yearmonth(Date)) |>
  mutate(Date = ym(Date)) |>  
  group_by(Date) |>
  summarize(Flow = mean(Flow)) |> 
  ungroup() 
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2013-01-01&endDT=2023-12-31
  1. Create a Time Series Split:
# Making the Tibble of Poudre Flow - Time Series Split:
pf_tbl <- as_tibble(poudre_flow, index = Date)

splits <- time_series_split(pf_tbl, assess = "12 months", cumulative = TRUE)
Using date_var: Date
training <- training(splits)
testing <- testing(splits)
  1. Specifying my Models:
mods <- list(
  arima_reg() |>  set_engine("auto_arima"),
  
  arima_boost(min_n = 2, learn_rate = 0.015) |> set_engine(engine = "auto_arima_xgboost"),
  
  prophet_reg() |> set_engine("prophet"),
  
  prophet_boost() |> set_engine("prophet_xgboost"),
  
  # Exponential Smoothing State Space model
  exp_smoothing() |> set_engine(engine = "ets"),
  
  # Multivariate Adaptive Regression Spline model
  mars(mode = "regression") |> set_engine("earth") 
)
  1. Fitting the Models
library(earth)
Warning: package 'earth' was built under R version 4.4.3
Loading required package: Formula
Loading required package: plotmo
Warning: package 'plotmo' was built under R version 4.4.3
Loading required package: plotrix

Attaching package: 'plotrix'
The following object is masked from 'package:scales':

    rescale
models <- map(mods, ~ fit(.x, Flow ~ Date, data = training))
frequency = 12 observations per 1 year
frequency = 12 observations per 1 year
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
frequency = 12 observations per 1 year
  1. Building the Modeltime Table
(models_tbl <- as_modeltime_table(models))
# Modeltime Table
# A tibble: 6 × 3
  .model_id .model   .model_desc            
      <int> <list>   <chr>                  
1         1 <fit[+]> ARIMA(0,0,2)(0,1,2)[12]
2         2 <fit[+]> ARIMA(0,0,2)(0,1,2)[12]
3         3 <fit[+]> PROPHET                
4         4 <fit[+]> PROPHET                
5         5 <fit[+]> ETS(M,N,M)             
6         6 <fit[+]> EARTH                  
  1. Calibrating the Models
(calibration_table <- modeltime_calibrate(models_tbl, testing, quiet = FALSE))
# Modeltime Table
# A tibble: 6 × 5
  .model_id .model   .model_desc             .type .calibration_data
      <int> <list>   <chr>                   <chr> <list>           
1         1 <fit[+]> ARIMA(0,0,2)(0,1,2)[12] Test  <tibble [12 × 4]>
2         2 <fit[+]> ARIMA(0,0,2)(0,1,2)[12] Test  <tibble [12 × 4]>
3         3 <fit[+]> PROPHET                 Test  <tibble [12 × 4]>
4         4 <fit[+]> PROPHET                 Test  <tibble [12 × 4]>
5         5 <fit[+]> ETS(M,N,M)              Test  <tibble [12 × 4]>
6         6 <fit[+]> EARTH                   Test  <tibble [12 × 4]>
  1. Accuracy
modeltime_accuracy(calibration_table) |>
  arrange(mae)
# A tibble: 6 × 9
  .model_id .model_desc             .type   mae  mape  mase smape  rmse     rsq
      <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1         1 ARIMA(0,0,2)(0,1,2)[12] Test   105.  86.2 0.380  50.6  246. 0.965  
2         2 ARIMA(0,0,2)(0,1,2)[12] Test   105.  86.2 0.380  50.6  246. 0.965  
3         5 ETS(M,N,M)              Test   110.  64.6 0.396  46.7  286. 0.828  
4         3 PROPHET                 Test   162. 385.  0.584 161.   229. 0.855  
5         4 PROPHET                 Test   162. 385.  0.584 161.   229. 0.855  
6         6 EARTH                   Test   205. 132.  0.738  92.4  456. 0.00697
  1. Re-Reading data into Actual Forecast:
glimpse(testing)
Rows: 12
Columns: 2
$ Date <date> 2023-01-01, 2023-02-01, 2023-03-01, 2023-04-01, 2023-05-01, 2023…
$ Flow <dbl> 7.291935, 21.900357, 28.806452, 76.760000, 648.500000, 1499.23333…
glimpse(pf_tbl)
Rows: 132
Columns: 2
$ Date <date> 2013-01-01, 2013-02-01, 2013-03-01, 2013-04-01, 2013-05-01, 2013…
$ Flow <dbl> 18.125806, 18.046429, 8.208710, 5.940667, 332.579677, 299.989333,…
#Reading in actual data

pf_future <- readNWISdv(siteNumber = "06752260",
                         parameterCd = "00060",
                         startDate = "2024-01-01",
                         endDate = "2025-04-01") |> 
  renameNWISColumns() |>
  mutate(Date = floor_date(Date, "month"),
         Flow = as.numeric(Flow)) |> 
  group_by(Date) |> 
  summarise(Flow = mean(Flow)) |> 
  ungroup() 
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2024-01-01&endDT=2025-04-01
  1. Forecasting
#Previous data
forecast <- modeltime_forecast(
  object = calibration_table,
  h = "12 months",
  actual_data = pf_tbl
)

plot_modeltime_forecast(forecast)
modeltime_forecast(
  object = calibration_table,
  h = "12 months",
  actual_data = pf_tbl
)
# Forecast Results
  
Conf Method: conformal_default | Conf Interval: 0.95 | Conf By ID: FALSE
(GLOBAL CONFIDENCE)
# A tibble: 204 × 7
   .model_id .model_desc .key   .index      .value .conf_lo .conf_hi
       <int> <chr>       <fct>  <date>       <dbl>    <dbl>    <dbl>
 1        NA ACTUAL      actual 2013-01-01   18.1        NA       NA
 2        NA ACTUAL      actual 2013-02-01   18.0        NA       NA
 3        NA ACTUAL      actual 2013-03-01    8.21       NA       NA
 4        NA ACTUAL      actual 2013-04-01    5.94       NA       NA
 5        NA ACTUAL      actual 2013-05-01  333.         NA       NA
 6        NA ACTUAL      actual 2013-06-01  300.         NA       NA
 7        NA ACTUAL      actual 2013-07-01   75.6        NA       NA
 8        NA ACTUAL      actual 2013-08-01   48.8        NA       NA
 9        NA ACTUAL      actual 2013-09-01 1085.         NA       NA
10        NA ACTUAL      actual 2013-10-01  146.         NA       NA
# ℹ 194 more rows
  1. Visualizing
plot_modeltime_forecast(forecast)
  1. Refit to Full Dataset & Forecast Forward
refit_tbl <- calibration_table |>
  modeltime_refit(data = pf_tbl)
frequency = 12 observations per 1 year
frequency = 12 observations per 1 year
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
frequency = 12 observations per 1 year
refit_tbl |>
  modeltime_forecast(h = "12 months", actual_data = pf_tbl) |>
  plot_modeltime_forecast()
  1. RERUN THIS THING:
# New Data
future_forecast <- refit_tbl |>
  modeltime_forecast(h = "12 months", actual_data = pf_future)

glimpse(future_forecast)
Rows: 88
Columns: 7
$ .model_id   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .model_desc <chr> "ACTUAL", "ACTUAL", "ACTUAL", "ACTUAL", "ACTUAL", "ACTUAL"…
$ .key        <fct> actual, actual, actual, actual, actual, actual, actual, ac…
$ .index      <date> 2024-01-01, 2024-02-01, 2024-03-01, 2024-04-01, 2024-05-0…
$ .value      <dbl> 11.01097, 24.53172, 49.60645, 104.31067, 194.36452, 383.43…
$ .conf_lo    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .conf_hi    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
refit_tbl <- calibration_table |>
  modeltime_refit(data = pf_tbl)
frequency = 12 observations per 1 year
frequency = 12 observations per 1 year
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
frequency = 12 observations per 1 year
  1. Merging Predicted and Actual:

I was having serious issues with trying to find .model_desc from my forecast, I ended up just providing what I have, in my graph below prophet does exist, but this is the code and data I was trying to provide with the subsequent error, although .model_desc WAS DEFNINITELY IN FUTURE FORECAST AND COMPARE DF! > compare_df <- future_forecast |> + filter(.model_desc %in% c(“UPDATE: ARIMA(0,0,2)(0,1,1)[12]”, “PROPHET”)) |> + select(Date = .index, .model_desc, .value) |> + left_join(pf_future, by = “Date”) |> + rename(Predicted = .value, Observed = Flow) Error: object ‘.model_desc’ not found

compare_df <- future_forecast |>
  select(Date = .index, .model_desc, .value) |>
  left_join(pf_future, by = "Date") |>
  rename(Predicted = .value, Observed = Flow)
  1. Calculating R-sq
lm_model <- lm(Observed ~ Predicted, data = compare_df)

summary(lm_model)$r.squared
[1] 0.6932388

R-squared value = 0.693…

This is a very good R-squared value, indicating that the predicted values are closr to the actual values, however, there is still some uncertainty that we can see with the data that was predicted, since the value isn’t at 1.

  1. Plotting the Predicted and Observed
ggplot(compare_df, aes(x = Predicted, y = Observed)) +
  geom_point(aes(color = .model_desc)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray40") + 
  geom_smooth(method = "lm", se = FALSE, color = "black") + 
  facet_wrap(~ .model_desc) +
  labs(title = "Predicted vs Observed Streamflow",
       x = "Predicted Monthly Flow (cfs)",
       y = "Observed Monthly Flow (cfs)",
       subtitle = "ESS330 A-22 | Neva Morgan") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'